6 research outputs found

    Winsorize tree algorithm for handling outliers in classification problem

    Get PDF
    Classification and Regression Tree (CART) is designed to predict or classify the objects in the predetermined classes from a set of predictors. However, having outliers could affect the structures of CART, purity and predictive accuracy in classification. Some researchers opt to perform pre-pruning or post-pruning of the CART in handling the outliers. This study proposes a modified classification tree algorithm called Winsorize tree based on the distribution of classes in the training dataset. The Winsorize tree investigates all possible outliers from node to node before checking the potential splitting point to gain the node with the highest purity of the nodes. The upper fence and lower fence of a boxplot are used to detect potential outliers whose values exceeding the tail of Q ± (1.5×Interquartile range). The identified outliers are neutralized using the Winsorize method whilst the Winsorize Gini index is then used to compute the divergences among probability distributions of the target predictor’s values until stopping criteria are met. This study uses three stopping rules: node achieved the minimum 10% of total training set

    Comparing the performance of winsorize tree to other data mining techniques for cases involving outliers

    Get PDF
    Winsorize tree is a modified tree that reformed from classification and regression tree (CART). It lays on the strategy of handling and accommodating the outliers simultaneously in all nodes while generating the subsequence branches of tree. Normally, due to the existence of outlier, the accuracy rate of most of the classifiers will be affected. Therefore, we propose winsorize tree which could resist to anomaly data. It protects the originality of the data while performing the splitting process. In this study, winsorize tree was compared to other classifiers. The results obtained from five real datasets indicate that the proposed winsorize tree performs as good as or even better compare to the other data mining techniques based on the misclassification rate

    A study of graduate on time (GOT) for Ph.D students using decision tree model

    Get PDF
    Over the years, there has been exponential growth in the number of Doctor of Philosophy (Ph.D) graduates in most of the universities all around the world. The increment of Ph.D students causes both university and government bodies concern about the capability of the Ph.D students to accomplish the mission of Graduate on Time (GOT) that is stipulated by the university. Therefore, this study aims to classify the Ph.D students into the group of “GOT achiever” and “non-GOT achiever” by using decision tree models. Historical data that related to all Ph.D students in a public university in Malaysia has been obtained directly from the database of Graduate Academic Information System (GAIS) in order to develop and compare the performance of decision tree models (Chi-square algorithm, Gini index algorithm, Entropy algorithm and an interactive decision tree). The result gained in four decision tree models illustrated that the attributes of English background, gender and the Ph.D students’entry Cumulative Grade Point Average (CGPA) result are the core in impacting the students’ success. Among all models, decision tree model with Entropy algorithm perform the best by scoring the highest accuracy rate (72%) and sensitivity rate (95%). Therefore, it has been selected as the best model for predicting the ability of the Ph.D students in achieving GOT. The outcome can certainly ease the burden of universities in handling and controlling the GOT issue. Also, the model can be used by the university to uncover the restriction in this issue so that better plans can be carried out to boost the number of GOT achiever in future

    Foreign Direct Investment Volatility And Economic Growth In Asean-Five Countries

    Get PDF
    This study examines the role of foreign direct investment (FDI) volatility as a source of variability in five major ASEAN economies. Using bounds testing approach, we show that while FDI has positive and significant effect in all the ASEAN economies considered, its volatility retards long-run economic growth in Indonesia, Malaysia, the Philippines and Thailand. Moreover, FDI volatility can be welfare reducing even after controlling for other country-specific growth correlates. This finding is robust to different measures of FDI volatility

    The stopping rules for winsorized tree

    No full text
    Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree.It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier.If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met.The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting.The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain

    Analyzing the factors that influencing the success of post graduates in achieving graduate on time (GOT) using analytic hierarchy process (AHP)

    No full text
    In the globalization era, education plays an important role in educating and preparing individuals to face the demands and challenges of 21st century.Thus, this contributes to the increase of the number of individuals pursuing their studies in Doctor of Philosophy (Ph.D) program.However, the ability of Ph.D students in heading to the four years Graduate on Time (GOT) mission that is stipulated by University has become a major concern of students, institution and government.Therefore, the main objective of this study is to investigate the factors that influence the Ph.D students in Universiti Utara Malaysia (UUM) to achieve GOT.Through the reviewing of previous research, six factors which are student factor, financial factor, supervisor factor, skills factor, project factors and institution factor had been identified as the domain factors that influence the Ph.D students in achieving GOT.The level of importance for each factor will be ranked by the experts from three graduate schools using Analytic Hierarchy Process (AHP) technique. This study will bring a significant contribution to the understanding of factors that affecting the Ph.D students in UUM to achieve GOT. In Addition, this study can also succor the university in planning and assisting the Ph.D students to accomplish the GOT in future
    corecore